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Recently, a new coronavirus was isolated from the lung tissue of autopsy sample and nasal/throat swabs of 
the patients with Severe Acute Respiratory Syndrome (SARS) and the causative association with SARS was 
determined. To reveal further the characteristics of the virus and to provide insight about the molecular mech- 
anism of SARS etiology, a proteomic strategy was utilized to identify the structural proteins of SARS corona- 
virus (SARS-CoV) isolated from Vero E6 cells infected with the BJ-01 strain of the virus. At first, Western 
blotting with the convalescent sera from SARS patients demonstrated that there were various structural pro- 
teins of SARS-CoV in the cultured supernatant of virus infected-Vero E6 cells and that nucleocaspid (N) pro- 
tein had a prominent immunogenicity to the convalescent sera from the patients with SARS, while the 
immune response of spike (S) protein probably binding with membrane (M) glycoprotein was much weaker. 
Then, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was used to separate the 
complex protein constituents, and the strategy of continuous slicing from loading well to the bottom of the 
gels was utilized to search thoroughly the structural proteins of the virus. The proteins in sliced slots were 
trypsinized in-gel and identified by mass spectrometry. Three structural proteins named S, N and M proteins 
of SARS-CoV were uncovered with the sequence coverage of 38.9, 93.1 and 28.1% respectively. Glycosyla- 
tion modification in S protein was also analyzed and four glycosylation sites were discovered by comparing 
the mass spectra before and after deglycosylation of the peptides with PNGase F digestion. Matrix-assisted 
laser desorption/ionization-mass spectrometry determination showed that relative molecular weight of 
intact N protein is 45 929 Da, which is very close to its theoretically calculated molecular weight 45 935 Da 
based on the amino acid sequence deduced from the genome with the first amino acid methionine at the N- 
terminus depleted and second, serine, acetylated, indicating that phosphorylation does not happen at all in 
the predicted phosphorylation sites within infected cells nor in virus particles. Intriguingly, a series of shorter 
isoforms of N protein was observed by SDS-PAGE and identified by mass spectrometry characterization. For 
further confirmation of this phenomenon and its related mechanism, recombinant N protein of SARS-CoV 
was Cleaved in vitro by caspase-3 and -6 respectively. The results demonstrated that these shorter isoforms 
could be the products from cleavage of caspase-3 rather than that of caspase-6. Further, the relationship 
between the caspase cleavage and the viral infection to the host cell is discussed. 
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1 Introduction 


Severe Acute Respiratory Syndrome (SARS), as a newly 
infectious disease, has seriously threatened the health 
of people worldwide. There were 8402 probable SARS 
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cases with 772 deaths having been reported from 29 
countries up to June 4 2003 (http://www.who.int.crs/ 
sars/country/en). An overall estimate of case fatality 
reached 14-15% as reported by WHO [1] and the mortal- 
ity rate in people older than 60 years could even be as 
high as 43-55% [2]. 


A number of laboratories worldwide have undertaken 
research on the identification of the causative agent of 
the SARS. An unknown virus that causes SARS was 
first isolated and announced on March 22 2003 [3]. 


* These authors contributed equally to this work. 
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Then coronavirus as a possible causal agent of SARS 
was identified from patients by using serological and 
RT-PCR methods and the possible route of transmis- 
sion of the virus was analyzed [4]. Further research in 
different laboratories indicated that a new coronavirus 
was associated with SARS by characterization of cyto- 
pathological features and ultrastructural features, and 
this new coronavirus is only distantly related to known 
coronavirus by genetic characterization [5, 6]. The ge- 
nomes of SARS-associated coronavirus (SARS-CoV) 
from different strains were sequenced and declared 
successionally [7-10]. 


A new strain of SARS-CoV was isolated from the lung 
tissue of autopsy sample and nasal/throat swabs of the 
patients with SARS and identified by morphology, serol- 
ogy, animal experiments, RT-PCR and partial sequence 
analysis by Qin et al. [11]. The causative association be- 
tween the isolate and SARS was also determined. Com- 
plete genome sequencing and comparative analysis of 
the isolate (BJO1 GenBank accession number AY278488) 
indicated that the genome size is 29.725 Kb and has 11 
ORFs [9]. The whole genome is composed of a stable 
region encoding an RNA-dependent RNA polymerase 
and a variable region representing four coding sequences 
for viral structural proteins (the S, E, M, N proteins) and 
five putative uncharacterized proteins. Its gene order is 
identical to that of other known coronaviruses. 


Although the genome sequencing and comparative anal- 
ysis provided abundant information to realize the charac- 
teristics of SARS-CoV and various predictions about 
the structures and functions of the proteins composing 
the virus particles, the information on natural proteins 
with post-translational modifications and possible iso- 
forms or cleavage products is difficult to obtain from the 
genome sequence. But this information is very important 
to understand the functions of these proteins and further 
to reveal the properties of the virus. Thus, a systematic 
proteomics research is necessary to identify these pro- 
teins at their natural forms and to probe their processing 
and modification directly. For this purpose, a mass spec- 
trometric characterization of proteins from the SARS 
virus was recently reported [12]. Two antigenic proteins 
with molecular masses ~ 46 and ~ 139 kDa, were char- 
acterized respectively. The glycosylation modification of 
the spike protein was determined. 


In this study, the structural proteins of SARS-CoV isolate 
BJO1 were investigated by proteomics strategies. Three 
out of four structural proteins with antigenicity against 
the convalescent sera from patients with SARS were 
characterized by SDS-PAGE and/or RP-HPLC combined 
with mass spectrometry. The peptide sequences with 
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some modification and cleavage that existed in these pro- 
teins were also analyzed. The relationship between these 
modifications and cleavages with their probable functions 
are discussed. 


2 Materials and methods 


2.1 Sample source 


The BJO1 strain of SARS-CoV was separated from lung 
tissue of deceased patients and cultured in Vero E6 cells. 
When the cytopathic effect was observed in more than 
75% cells infected with BJO1 strain of SARS-CoV, the 
cultured supernatant containing virus and infected cells 
was harvested. The cells were frozen and thawed repeat- 
edly in the medium to completely release the virus parti- 
cles. After centrifugation at 6000 rpm for 10 min (JA-25.50, 
Beckmann, Fullerton, CA, USA), the lysates were dialyzed 
and subsequently measured at 595 nm for protein con- 
centration according to Lowry methods [13] and then lyo- 
philized. As for controls, the noninfected Vero E6 cells 
were cultured and processed in the same way as the 
infected cells. 


2.2 Chemicals and reagents 


Electrophoresis reagents including acrylamide, N,N-meth- 
ylenebisacrylamide (Bis), TEMED, Tris base, glycine, DTT, 
Low Molecular Marker were purchased from Amersham 
Biosciences (Uppsala, Sweden). lodoacetamide and 
TFA were from Acros (New Jersey, USA). Trypsin (se- 
quencing grade) and DTT were obtained from Promega 
(Madison, WI, USA). Endoproteinase Glu-C (sequencing 
grade), PNGase F, ammonium bicarbonate and ammo- 
nium acetic acid were purchased from Sigma (St. Louis, 
MO, USA). Caspase-3 and caspase-6 were from BD 
Pharmingen (San Diego, CA, USA). Acetonitrile (HPLC 
grade) was purchased from J.T. Baker (Phillipsburg, NJ, 
USA); formic acid (FA) was obtained from Beijing Chemi- 
cals (Beijing, China). 


2.3 SDS-PAGE 


The lyophilized samples of Vero E6 cells lysates infected 
by SARS-CoV (S) and control (V) were suspended in load- 
ing buffer (50 mw/L Tris-HCI pH 6.8, 100 mm/L DDT, 2% 
SDS, 0.1% bromophenol blue, 10% glycerol), respec- 
tively. The samples were run on SDS-PAGE (T = 13%) in 
Tris-glycine running buffer with 100 ug protein per lane, 
and stained with Coomassie blue R250. 
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2.4 Western blotting 


Human specific anti-SARS-CoV sera were obtained from 
10 clinical cases of convalescent SARS patients (14-28 d 
after being diagnosed as SARS) and one patient (21 d 
after onset of SARS), and the control normal human 
serum was collected from 6 uninfected donors with their 
permission. 


One group (A) of 100 1g and two groups (B and C) of 50 1g 
lyophilized samples of SARS-CoV infected Vero E6 cells 
lysates (S) and Vero E6 cells lysates (V) were set up for 
Western blot experiments respectively. All three groups 
were electrophoresed in the same conditions with 13% 
SDS-PAGE. The separated proteins were transferred to 
Hybond-P PVDF membrane (Amersham Biosciences) at 
20°C for 3 h and the remaining gel was stained with Coo- 
massie blue for protein identification [6]. After overnight 
incubation at 4°C in blocking buffer (20 mm Tris-HCl 
PH 7.5, 140 mm NaCl, 0.05% Tween-20, 5% nonfat dried 
milk), the membranes of group A were probed by the 
addition of the antisera (1:1000, v/v in PBST, 5% nonfat 
dried milk) from one clinical case of convalescent SARS 
and incubated for 2 h at room temperature. Group B 
membranes was hybridized with pooled antisera from 
10 convalescent patients, which were all qualified to be 
positive for antibodies to the SARS-CoV by indirect 
immunofluorescence assay (IFA). Group C was testified 
by adding sera from uninfected donors (negative in IFA). 
After washing in PBST 3x10 min each, the membrane 
was incubated for 1 h with horseradish peroxidase-conju- 
gated second antibody (Amersham Biosciences, 1:10 000, 
v/v in PBST, 5% nonfat dried milk), and then washed in 
PBST three times. Finally the blots were developed with 
ECL Western blot kit (Santa Cruz Biotechnology, Santa 
Cruz, CA, USA) and reactive bands were detected by 
exposure to Kodak X-Omat K film for 3 min at ambient 
temperature. Bands that showed an apparent reaction 
with antisera were cut out and stored at 4°C until ana- 
lyzed by MS. 


2.5 In-gel digestion 


After SDS-PAGE, the gel was sliced into 30 x 2 mm strips 
per lane manually from the loading well to the bottom. 
The gel slices were destained with 50% ACN/25 mm 
NH,HCOs, reduced with 10 mm DTT at 56°C and alkylated 
in the dark with 50 mm iodoacetamide at room tempera- 
ture for 1 h. Then the gel plugs were lyophilized and 
immersed in 15 pL of 10 ng/uL trypsin solution in 25 mm 
NH,4HCOs. The digestion was kept at 37°C for 15 h. Tryp- 
tic peptide mixtures were first extracted with 100 pL 5% 
TFA and then with same volume of 2.5% TFA/50% ACN. 
The extracted solutions were blended, lyophilized and 
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used for further identification by MS. Furthermore, Glu-C 
in PBS (pH 7.8), which hydrolyzes peptide bonds at the 
carboxyl side of glutamyl and aspartyl residues, was also 
used for in-gel protein digestion and peptide extraction to 
improve sequence coverage of the nucleocapsid protein. 
For the identification of the glycosylated spike protein, the 
dried gel particles were suspended in 15 pL PNGase F 
solution (500 U/mL) and incubated at 4°C for at least 
40 min, then at 37°C overnight. During the incubation, 
10-25 uL of water was added to ensure the gel plugs 
were covered with liquid at all times. 


2.6 RP-HPLC 


RP-HPLC of the protein mixture was performed on a 
prepacked column (50 mmx 4.6 mm id, Hypersil C18, 
5 um spherical particles with pore diameter 300 A; Elite, 
Dalian, China). The flow rate was 1.0 mL/min and detec- 
tion wavelength was set at 280 nm. Mobile phase A con- 
sisted of water/ACN (95/5, v/v) with 0.1% TFA. Mobile 
phase B consisted of water/ACN (5/95, v/v) with 0.1% 
TFA. The separation was performed by running a non- 
linear gradient: 10-90% B, for 60 min, 90-100% B, for 
5 min, retaining 100% B for 5 min, then coming to 100% 
A for 5 min and keeping the system in 100% A for 10 min 
for another run. The lyophilized protein mixture was dis- 
solved in 8 M urea and 25-50 pL sample was injected by 
a Rheodyne injection valve (Rheodyne, Rohnert Park, CA, 
USA) in multiloading mode. The chromatographic frac- 
tions were collected and lyophilized, followed by trysin 
digestion and MS identification. For measurement of the 
M, of the nucleocaspid (N) protein, the relative fraction 
was lyophilized for MALDI-MS analysis. 


2.7 Capillary RP-HPLC 


Capillary RP-HPLC of the peptide mixture was carried out 
on a Micromass CapLC liquid chromatography system 
including three pumps A, B and C (Micromass, Manches- 
ter, UK). Fused silica tubing (150 mm x 75 «um id) packed 
with PepMap C18, 3 um spherical particles with pore di- 
ameter 100 A (LC Packings, Amsterdam, Netherlands) 
was used. The flow rate was set at 2.0 pL /min and split 
into ca. 0.15 pL/min prior to the precolumn and analytical 
column. Samples were injected at a flow rate of 30 uL/min 
with pump C by the autosampler and salts were removed 
on the precolumn of 320 um x 5 mm PepMap C18, 3 um 
spherical particles with pore diameter 100 A (LC Pack- 
ings). The precolumn was connected in the 10-port 
switching valve, and switched to the analytical column 
after the sample was desalted. Mobile phase A consisted 
of water/ACN (95/5, v/v) with 0.1% FA. Mobile phase B 
consisted of water/TFA (5/95, v/v) with 0.1% FA. The 
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separation was performed by running a nonlinear gradi- 
ent: 4% B, in 0.1-3.5 min for injection; 4-50% B, in 3.5— 
63.5 min; 50-100% B, in 63.5-73.5 min; 100% B in 73.5— 
80 min; 100-4% B, in 80-85 min. After 15 min equilibra- 
tion in 100% A, another analysis could be run. The CapLC 
is coupled on-line with a Q-TOF Micro mass spectrometer 
(Micromass) for detection and protein identification. 


2.8 MALDI-TOF MS 


2.8.1 Molecular weight determination 


Molecular weight measurements of proteins or peptides 
were carried out on a Reflex III MALDI mass spectrometer 
(Bruker Daltonics, Bremen, Germany), equipped with a 
flight tube (linear mode, 1.6 m long), laser (Nz, 337 nm) 
and scout 384 target system. Accelerating voltage was 
20 kV and Microchannel plate (MCP) detector working at 
1.6 kV. Mass spectra were acquired in positive mode and 
300 shots were summed for each spectrum. One pL sam- 
ple dissolved in 1% TFA was mixed with 1 L matrix solu- 
tion (sinapic acid; Sigma) and centrifuged, 1 L of super- 
natant was spotted on the target. One pmol of BSA was 
used to calibrate the instrument. 


2.8.2 Peptide mass fingerprinting 


Mass spectra were recorded with a MALDI-R MALDI 
mass spectrometer (Micromass). The instrument was 
calibrated with a tryptic digest mixture of alcohol dehy- 
drogenase. Positive ion mass spectra were recorded in 
reflectron mode with «-cyano-4-hydroxycinnamic acid 
as the matrix. Samples dissolved in 0.5-1 uL of water 
were crystallized with 0.5 pL of a saturated solution of 
the matrix in ACN on the target. Reflection spectra were 
acquired using the delayed extraction technique in posi- 
tive ion mode with an acceleration voltage of 1.5 kV. 
About 100 laser shots were summed to acquire the spec- 
tra and MassLynx software (Micromass) was used to pro- 
cess the data. Database searching was manually perform- 
ed using the MASCOT (http://www.matrixscience.com/), 
or Peptldent (http://www.expasy.ch/tools/peptident.html) 
programs available on the web. 


2.9 LC-ESI MS/MS 


2.9.1 Nanospray ESI MS/MS 
All MS/MS measurements were carried out on hybrid 


quadrupole-time of flight mass spectrometer (Q-TOF2; 
Micromass) with a nanospray needle sample introducing 
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system at an applied spray voltage of 3000 V, MCP detec- 
tor with 2250 V of working voltage, energy adjustable col- 
lision cell filled with pure argon gas. Typically, a 2 u.L sam- 
ple was loaded in the Nanoflow Probe Tip (Micromass), 
the sample cone working on 25-40 V. The instrument 
was controlled in MassLynx 3.5 and sequences were 
manually read out in BioLynx. Generally, spectra were 
generated from 100-500 MS/MS scans. The accuracy of 
external calibration of Glu-Fib was 3 ppm. A local protein 
search engine Global Server 1.1 beta was setup with local 
NCBinr database for automatic protein identification 
(using peaklist files) and local BLAST software with the 
same protein database for sequence alignment. 


2.9.2 Nanoflow ESI MS/MS 


For analysis of peptide mixture by LC-ESI MS/MS, lyo- 
philized peptide mixtures were dissolved with 5.5 uL of 
0.1% FA in 2% ACN and injected by autosampler onto a 
0.3 x 1 mm trapping column (PepMap C18; LC Packings) 
using a CapLC system. Peptides were directly eluted into 
a Q-TOF mass spectrometer (Q-TOF Micro; Micromass) 
at 200 nL/min on a C18 column (75 um x 15 cm; LC Pack- 
ings). MS/MS data were processed using MassLynx 3.5 
and searched against NCBInr protein sequence data- 
bases via internet available MS/MS ion searching pro- 
gram MASCOT (http://www.matrixscience.com). 


2.10 In vitro cleavage of N protein by caspase-3 
and -6 


In order to probe the cleavage mechanism of the N pro- 
tein in Vero E6 cells infected with SARS-CoV, recombi- 
nant N protein was used as a substrate to test the pos- 
sibility of the protein cleavage by cysteine proteases, 
which play a central role in cell apoptosis. Caspase-3 
and caspase-6 were selected and added into the reac- 
tion system respectively. The reactions were carried out 
in caspase reaction buffer (20 mm/L piperazine-N,N’-bis 
(2 ethanesulfonic acid )-NaOH (pH 7.2), 100 mm/L NaCl, 
2% sucrose, 0.2 mmu/L EDTA, 10 mm/L DTT) and incu- 
bated at 37°C for 15 h. The reactant was analyzed by 
13% SDS-PAGE. 


2.11 Experimental procedures 


A flowchart of the experimental procedure for the identifi- 
cation of structural proteins of SARS-CoV is shown in 
Fig. 1. Because it was difficult to get a plentiful amount of 
virus particles for the study, the original sample obtained 
for analysis was a complicated mixture of SARS-CoV par- 
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Figure 1. Flowchart of experimental procedures for proteome research on structural proteins of 


SARS-CoV. 


ticles, Vero E6 cells, and culture media. A proteomics 
strategy based on SDS-PAGE was taken to separate the 
complex mixture at first, and then the total bands on the 
gel were sliced, in-gel digested and characterized with 
peptide sequencing by LC-ESI MS/MS. To acquire accu- 
rate molecular weight information as well as the N-termi- 
nus of the protein, RP-HPLC was performed to separate 
the protein of interest, MALDI-MS was employed to iden- 
tify the protein with peptide mass fingerprinting. RP- 
HPLC was also used for characterization of the virus pro- 
tein by prefractionation of the sample mixture to decrease 
the complexity of the samples, and MALDI-MS was used 
to characterize the peptide mixture to increase the se- 
quence coverage of the proteins. 
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3 Results and discussion 


3.1 Protein identification of SARS-CoV from 
infected cells and antigenicity analysis of 
viral structural proteins 


The antisera from SARS patients and convalescent 
patients were utilized for antigenicity analysis of viral pro- 
teins. The results demonstrate that the antisera from a 
single patient and 10 convalescent patients notably re- 
acted with SARS-CoV related proteins (Figs. 2A and 2B) 
with the apparent mass range approximately 21-200 kDa, 
which contains two very strong hybridized bands (4 and 5) 
with an apparent mass ~46 kDa, and three much weaker 
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Figure 2. Western blot analysis of proteins from SARS- 
CoV in Vero E6 cell lysates. SARS-CoV in Vero E6 cell 
lysates (S) and control Vero E6 cell lysates (V) were sepa- 
rated by 13% SDS-PAGE and analyzed by Western blot. 
The transfered PVDF membranes were probed respec- 
tively with the antisera from one (A) and 10 convalescent 
patients (B) or a control pooled sera from 6 noninfected 
donors (C). The putative S glycoprotein and putative pro- 
tein M from SARS-CoV were identified in the bands cut 
out at position 1, 2 and 3, and the putative N protein from 
SARS-CoV in bands 4-9. 


Structural proteins of SARS coronavirus 497 


protein bands (1-3) with an apparent mass range of more 
than 100 kDa. The reacted proteins in the gel bands were 
further identified by LC-ESI MS/MS. The putative S glyco- 
protein and putative M protein were found in bands 1-3 
(Fig. 2A), 1 and 3 (Fig. 2B), and the putative N protein in 
bands 4-9 (Fig. 2A), 4-8 (Fig. 2B). Although different anti- 
sera demonstrated slight differences on antigenicity, the 
main antigenic proteins were identified as N protein. In- 
triguingly, a strong reaction with the antisera of SARS 
patients was observed when M protein (theoretical mass 
25 060 Da) comigrated with S glycoprotein in the gel. 
However, when separated in the gel these two proteins 
did not show any characteristics of antigenicity. These 
results imply that the antigenicity of M and S proteins 
might depend on their interaction or physical binding. 


3.2 Characterization of structural proteins of 
SARS-CoV 


Figure 3 shows the results of SDS-PAGE separation and 
MS characterization of the total proteins from the lyoph- 
ilized samples of Vero E6 cells infected by SARS-CoV, 
in which spike protein (S), nucleocaspid protein (N) and 
membrane glycoprotein (M) were identified respectively. 
Tables 1-3 show the calculated and measured mass 
values of peptides found in tryptic and Glu-C digests of 
S, N and M proteins. The sequence coverage was 38.9, 
93.1 and 28.1% respectively. It should be noted that the 
sequence coverage of S and N proteins is close to that 


Table 1. Calculated and measured mass values of peptides found in tryptic digest of spike (S) = protein 


Position Calc. Meas. Error (Da) Peptide sequence MALDI ESI-QTOF 
39-48 1258.6102 1258.6166 0.0064 GVYYPDEIFR + + 
85-94 1114.5415 1114.5382 0.0033 DGIYFAATEK + + 

189-198 1246.6446 1245.6900 0.0454 NKDGFLYVYK + 

191-198 1004.5087 1004.5148 0.0061 DGFLYVYK + 

199-207 1046.5629 1046.5533 0.0096 GYQPIDVVR + 

208-221 1576.8733 1576.8434 0.0299 DLPSGFNTLKPIFK + + 

222-232 1257.7313 1257.6700 0.0613 LPLGINITNFR + 

298-306 1085.5374 1085.5242 0.0132 GIYQTSNFR + 

298-315 1994.0454 1994.0020 0.0434 GIYQTSNFRVVPSGDVVR + 

334-342 1154.5629 1154.5600 0.0029 FPSVYAWER + + 

374-390 1990.9578 1990.9167 0.0411 LNDLC@FSNVYADSFVVK + + 

396-411 1737.8806 1737.8700 0.0106 QIAPGQTGVIADYNYK + + 

412-426 1737.8087 1737.8363 0.0276 LPDDFMGCVLAWNTR + + 

427-439 1460.6652 1460.7098 0.0446 NIDATSTGNYNYK + + 

448-453 817.4679 817.4553 0.0126 LRPFER + 

496-514 2014.1041 2014.0075 0.0966 VVVLSFELLNAPATVC*GPK — + 

544-553 1310.6752 1310.6885 0.0133 RFQPFQQFGR + + 

545-553 1154.5741 1154.5764 0.0023 FQPFQQFGR + + 

554-563 1140.5167 1140.5516 0.0349 DVSDFTDSVR + + 


© 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 


www.proteomics-journal.de 


498 W. Ying et al. 


Table 1. Continued 
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Position Calc. Meas. Error (Da) Peptide sequence MALDI ESI-QTOF 
554-566 1480.6914 1480.7450 0.0536 DVSDFTDSVRDPK + + 
748-758 1130.5800 1130.5587 0.0213 ALSGIAAEQDR + + 
798-807 1225.6463 1225.6444 0.0019 SFIEDLLFNK + + 
808-817 1052.5445 1052.5700 0.0255 VTLADAGFMK + 
818-829 1395.6321 1395.6429 0.0108 QYGEC@LGDINAR + + 
888-903 1823.9286 1823.8540 0.0746 FNGIGVTQNVLYENQK + + 
912-929 1848.9913 1848.9657 0.0256 AISQIQESLTTTSTALGK + + 
930-946 1868.0236 1867.9619 0.0617 LQDVVNQNAQALNTLVK + + 
947-965 2021 .0662 2021.0233 0.0429 QLSSNFGAISSVLNDILSR + 
966-977 1414.7536 1414.7466 0.0070 LDKVEAEVQIDR + 
983-996 1690.9486 1690.9485 0.0001 LQSLQTYVTQQLIR + + 

1028-1055 3167.5883 3167.5330 0.0553 GYHLMSFPQAAPHGVVFLH + 

V TYVPSQER 

1164-1173 1186.6426 1186.7100 0.0674 EIDRLNEVAK + 

1238-1248 1293.5845 1293.6500 0.0655 FDEDDSEPVLK + 

1238-1251 1577.7693 1577.8517 0.0824 FDEDDSEPVLKGVK + 

1249-1255 817.4566 817.4588 0.0022 GVKLHYT + 


a) These cysteines were modified with iodoacetamide 
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Figure 3. Identification of viral 


proteins of SARS- 


CoV from lysates of attacked 
Vero E6 cells by SDS-PAGE 
MS. The gel was stained with 
Coomassie Brilliant Blue R250. 
Thirty slices were cut from load 
well to the bottom of the gel. 
The bands in which SARS-CoV 
proteins were identified were 
denoted as S (spike protein), N 
(nucleocapsid protein) and M 
(membrane glycoprotein) _ re- 
spectively. The results showed 
that S protein mainly existed as 
a highly modified protein, thus 


appeared separately at about 200 kDa (slice 3). N protein existed mainly as an integrity molecule at about 45 kDa 
(slice 12), but unlike the report [12], there were few fragmentation bands of this protein. M protein was detected not only 
at its theoretical molecular weight position (slice 22), but also at a very high position with S protein. 
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Table 2. Calculated and measured mass values of peptides found in tryptic and Glu-C digest of nucleocaspid protein 


Position Calc. Meas. Error (Da) Peptide sequence MALDI ESI 
1-10 1144.4990 1144.5079 0.0089 Ac-SDNGPQSNQR + 
11-32 2262.0493 2262.0700 0.0207 SAPRITFGGPTDSTDNNQNGGR + 
15-32 1850.8263 1850.8726 0.0463 ITFGGPTDSTDNNQNGGR 
33-40 926.5278 926.5300 0.0022 NGARPKQR + 
41-61 2324.1894 2324.1445 0.0449 RPQGLPNNTASWFTALTQHGK 
41-65 2851.4597 2851.4800 0.0203 RPQGLPNNTASWFTALTQHGKEELR + 
62-68 946.5000 946.5200 0.0200 EELRFPR 
63-812) 2095.0400 2095.0300 0.0100 ELRFPRGQGVPINTNSGPD 
69-88 2151.0101 2151.0105 0.0004 GQGVPINTNSGPDDQIGYYR 
94-102 947.5091 947.5657 0.0566 VRGGDGKMK + 
104-1189) 1848.8955 1848.9233 0.0278 LSPRWYFYYLGTGPE 
108-127 2297.0913 2297.0747 0.0166 WYFYYLGTGPEASLPYGANK 
119-1282) 1049.5262 1049.5714 0.0452 ASLPYGANKE 
128-143 1684.8904 1684.9500 0.0596 EGIVWVATEGALNTPK + 
137-174) 4019.0733 4019.0828 0.0095 GALNTPKDHIGTRNPNNNAATVLQLPQ 
GTTLPKGFYAE 
150-169 2091.1193 2091.0713 0.0480 NPNNNAATVLQLPQGTTLPK 
170-177 886.4053 886.4592 0.0539 GFYAEGSR + 
178-189 1166.5508 1166.5620 0.0112 GGSQASSRSSSR + 
210-226 1687.9047 1687.9534 0.0487 MASGGGETALALLLLDR 
217-2314 1696.0003 1696.0493 0.0490 TALALLLLDRLNQLE 
232-2534) 2275.2000 2275.2200 0.0200 SKVSGKGQQQQGQTVTKKSA AE 
238-249 1372.7100 1372.7900 0.0800 GQQQQGQTVTKK + 
254-2809) 3103.6984 3103.7602 0.0618 ASKKPRQKRTATKQYNVTQA 
FGRRGPE 
267-276 1183.5854 1183.6693 0.0839 QYNVTQAFGR 
277-293 1930.9365 1930.9364 0.0001 RGPEQTQGNFGDQDLIR 
278-293 1774.8354 1774.8784 0.0430 GPEQTQGNFGDQDLIR 
294-319 2928.3886 2928.4100 0.0214 QGTDYKHWPQIAQFAPSASAFFGMSR + 
300-319 2236.0756 2236.0295 0.0461 HWPQIAQFAPSASAFFGMSR 
320-338 2061.0473 2061.0054 0.0419 IGMEVTPSGTWLTYHGAIK 
339-355 2015.0807 2015.0345 0.0462 LDDKDPQFKDNVILLNK 
348-361 1655.9115 1655.8900 0.0215 DNVILLNKHIDAYK + 
349-358?) 1178.6891 1178.7213 0.0322 NVILLNKHID 
356-369 1685.8500 1685.9100 0.0600 HIDAYKTFPPTEPK + 
359-37 14) 1521.7947 1521.8481 0.0534 AYKTFPPTEPKKD 
375-385 1282.6750 1282.6672 0.0078 KTDEAQPLPQR 
376-385 1154.5800 1154.5985 0.0185 TDEAQPLPQR + 
379-3999) 2300.3084 2300.2952 0.0132 AQPLPQRQKKQPTVTLLPAAD 
388-405 2005.0059 2005.0208 0.0149 KQPTVTLLPAADMDDFSR 
389-405 1876.9109 1876.9183 0.0074 QPTVTLLPAADMDDFSR 
406-421 1594.6900 1594.7800 0.0900 QLQNSMSGASADSTQA + 


a) Peptides from Glu-C digest 


of a Canadian report (96% and 42%). Notably, this is the 
first time that M protein has been identified in its natural 
form [12]. 


3.2.1 Spike protein analysis 


Coronavirus Spike Protein is a large, type | membrane 
glycoprotein that contains distinct functional domains 
near the amino (S1) and carboxy (S2) termini. These 
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spikes function to define viral tropism by their receptor 
specificity and perhaps also by their membrane fusion 
activity during virus entry into cells. For most corona- 
viruses, spike proteins were post-translationally cleaved 
into two subunits after synthesis, S1 and S2. The periph- 
eral S1 portion can independently bind cellular receptors 
while the integral membrane S2 portion is required to 
mediate fusion of viral and cellular membranes. The 
extraordinary variations in host range and tissue tropism 
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Table 3. Calculated and measured mass values of peptides found in tryptic digest of membrane 


glycoprotein 


Position Peptide sequence Calc. Meas. Error (Da) 
107-124 SMWSFNPETNILLNVPLR 2130.0900 2130.1300 0.0400 
205-221 LNTDHAGSNDNIALLVQ 1793.8900 1793.9200 0.0300 
171-184 TLSYYKLGASQR 1385.7300 1385.7600 0.0300 
186-197 VGTDSGFAAYNR 1256.5858 1256.6400 0.0542 


among coronaviruses are in large part attributable to var- 
iations in the spike glycoprotein [14]. As a surface glyco- 
protein, spike proteins may offer an attractive target for 
new drugs. Illumination of the structures featured includ- 
ing glycosylation may lead to important therapeutic appli- 
cations. 


Here, spike protein was detected in five bands of the gel 
(Fig. 3, slices 1, 2, 3, 5, 6). The M, of slices 1, 2 and 3 were 
higher than the theoretical one (139 kDa), which indicated 
the existence of a large quantity of modifications after the 
translation. Surprisingly, slices 5 and 6 were found at a 
position significantly lower than the theoretical M/, of in- 
tact S protein, which implicated the possible cleavage of 
S protein. In addition, besides S protein, M protein was 
also identified in bands 1, 2, 5 and 6, where the Vs are 


significantly higher than its theoretic M,, implying that 
there was a strong interaction or physical binding be- 
tween the two proteins. 


To investigate glycosylations in the spike protein, the pro- 
tein was first deglycosylated with PNGase F and then 
treated with trypsin in gels as described above in Sec- 
tion 2.5. After deglycosylation, asparagines residues 
were converted to aspartic acids, which specified the cor- 
responding deglycosylated peptides through the obser- 
vation of their mass difference of 0.984 Da per deglyco- 
sylated site from the values calculated from the predicted 
sequence. As a result, four glycosylated peptides were 
identified by comparing the mass spectra before and after 
deglycosylation of the peptides (Table 4). Figure 4 shows 
the mass spectra of glycosylated peptide T1074—1089. 
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Figure 4. Comparison of the same section of mass spectra of tryptic digestion of spike protein. 
(A) without deglycosylation with PNGase F and (B) after deglycosylation with PNGase F by MALDI- 
MS. T1074—1098 indicates the residue numbers corresponding to the intact protein. The peak corre- 
sponding to T1074—1098 is absent in A, but it is present in B after deglycosylation. The measured 
m/z value of 1888.9420 for [M+H]* ions corresponds to a ~0.984 Da difference from the predicted 
value for nonglycosylated T1074—1098. The measured and predicted masses for the identified peaks 


are shown in Table 4. 
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Table 4. Deglycosylated peptides found in tryptic digest of PNGase F-treated SARS spike protein 


Position Peptide sequence Calculated Detected AM 

71137-1163 YFKNHTSPDVDLGDISGINA 2931.4846 2932.5305 1.0469 
SVVNIQK 

1778-796 YFGGFNFSQILPDPLKPTK 2169.1378 2170.1252 0.9874 

71074-1089 EGVFVFNGTSWFITQR 1887.9388 1888.9672 0.9284 

7111-126 SQSVIIINNSTNVVIR 1756.9915 1758.0128 1.0213 


All the glycosylated peptides we found showed clearly in 
their mass spectra that each had only one glycosylated 
site. This result was different from that published by the 
Canadian group, in which two peptides with two glyco- 
sylation sites were characterized [12]. We also de- 
termined that peptide T222-232, which contained a 
potential glycosylation site, displayed no glycosylation 
at all. 


3.2.2 Nucleocaspid protein analysis 


N protein was identified by both SDS-PAGE/mass spec- 
trometry and RP-HPLC/mass spectrometry. The fractions 
separated by RP-HPLC were collected, concentrated 
and digested. The analytical results by LC-ES! MS/MS 
showed that fraction 24 in the HPLC chromatogram con- 


‘0.0 10.0 20.0 30.0 40.0 
[Minutes] 


tained intact N protein (Fig. 5, peak 2), so MALDI-MS was 
used to measure the M/,, as shown in Fig. 5, which was 
defined as 45 929 Da. This result is in agreement with the 
data published [12] that the first amino acid methionine in 
the N-terminus of the protein was depleted and the sec- 
ond serine was acetylated. The theoretical molecular 
weight calculated based on the predicted amino acid 
sequence is 45 935 Da. Comparing the calculated molec- 
ular weight with the measured one, the relative error was 
less than 0.13/1000. To confirm the amino acid sequence 
in the N-terminus of N protein, Edman degradation was 
performed on a PVDF membrane blotted from SDS- 
PAGE of the N protein by protein sequencer following the 
instrument’s manual (Procise Sequencer; Applied Bio- 
systems, Foster City, CA, USA). Only after deacetylation 
with TFA according to [15] was the first amino acid serine 
identified (data not shown). 


Figure 5. Chromatogram of 
Vero E6 cells lysates infected 
by SARS-CoV and mass spec- 
trometry spectrum of intact N 
protein. Chromatographic con- 
dition: column, 50x 4.6 mm id 
Hypersil C18, 5 um , 300 A; 
mobile phase A: water/ACN 
(95/5, v/v) with 0.1% TFA, 
mobile phase B: water/ACN (5/ 
95, v/v) with 0.1% TFA; flow 
rate: 1.0 mL/min; nonlinear gra- 


50.0 60.0 


dient: 10-90% B, in 60 min, 90-100% B, in 5 min and retaining 100% B for 5 min, and then coming to 100% A in 5 min. 1, 
cleaved N protein; 2, intact N protein, fraction 24. The arrow indicates the molecular weight spectrum of intact N protein in 


fraction 24 corresponding to peak 2. 
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The phosphorylation of N protein is a well documented 
phenomenon in many coronaviruses such as murine 
hepatitis virus (MHV), infectious bronchitis virus (IBV), 
bovine coronavirus (BCV) and porcine epidemic diar- 
rhoea virus (PEDV) [16-20], in which phosphorylation 
were determined by metabolic labeling methods for re- 
combinant forms rather than natural forms. The functional 
significance of N proteins’ phosphorylation is still under 
investigation. In the case of MHV, the widely investigated 
coronavirus, phosphorylated N protein was considered 
as having a higher RNA binding capacity than the un- 
phosphorylated one, and its dephosphorylation was found 
to be in connection with initiation of the infection [21, 22]. 
As phosphorylation of the N protein may be a general 
phenomenon in coronavirus, several methods were tried 
to confirm the hypothesis but all failed (data not shown). 
These included Western blot analysis for phosphoprotein 
from viral proteins separated by SDS-PAGE using an anti- 
phosphoprotein antibody, IMAC (immobilized metal affin- 
ity chromatography) combined with LC-MS/MS to look- 
ing for phosphopeptides from tryptic digest of N protein, 
and phosphatase digestion to find the new peptides after 
dephosphorylated. Although we did not find any evidence 
that the N protein is a phosphoprotein, its real status is still 
unknown due to the sensitivity of the methods used. 
Metabolic labeling using °*P should be a more sensitive 
method and will provide the direct evidence for the phos- 
phorylation status of SARS N protein. However, as a 
strong indirect evidence, we should indicate the result of 
molecular weight resolution of intact natural N protein, 
45 929 Da (real M,) compared with the theoretical one of 
45 935 Da of almost naked protein (only with N-terminal 
acetylation, without any phosphorylation). Their differ- 
ence, like the relative error, was less than 0.13/1000, indi- 
cating that there is no phosphorylation in natural N protein 
at all. 


From SDS-PAGE, N protein was found existing in different 
bands distributed between 46 kDa and 20 kDa (Fig. 3). This 
phenomenon was also observed with various corona- 
viruses such as transmissible gastroenteritis virus (TGEV), 
MHY, feline coronavirus (FIPV), BCV, avian IBV and turkey 
coronavirus (TCV) late in infection in cell culture, which 
resulted from caspases cleavage in the host cell [23]. To 
testify the role of caspase in N protein cleavage, caspase- 
3 and -6 were selected to cleave recombinant N protein of 
SARS CoV. The experimental result showed that recombi- 
nant N protein of SARS CoV could be cleaved by caspase- 
3 rather than caspase-6 and the pattern of peptide distri- 
bution was similar to that in infected cells (Fig. 6), which 
may imply that caspase-dependent apoptosis occurs 
and that caspase-3 cleaves the N protein in vivo during 
the late phase of virus infection. This phenomenon was 
not reported by the Canadian group [12]. 
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Figure 6. Cleavage of recombinant N protein of SARS 
CoV by caspase-3 and -6 in vitro. 1, recombinant N pro- 
tein; 2, recombinant N protein incubated with caspase-3; 
3, recombinant N protein incubated with caspase-6. 


Apoptosis is an important process in the development 
and cell defense, which usually causes morphological 
and biochemical changes. Caspases, special kinds of 
proteases participating in such a reaction, are activated 
in a cascade triggered by apoptosis signals. Virus-in- 
fected cells undergo apoptosis by the attack of cytotoxic 
cells, including cytotoxic T cells and natural killer cells 
[24-26]. As for coronavirus, it is reported that infection 
with mouse hepatitis virus strain 3 (MHV-3) results in 
lethal fulminant hepatic necrosis [27]. In IBV research, it 
was found that replication of IBV in Vero cells caused ex- 
tensive cytopathic effects, leading to destruction of the 
entire monolayer and the death of infected cells [28]. In 
some further research on apoptosis about coronavirus, 
the E protein of MHV has been confirmed as an apopto- 
sis inducer in 17Cl-1 cells [29]. In the TEGV-infected 
cells, the nucleocaspid protein can be cleaved by cas- 
pase-6, and caspase-7 in vitro rather than caspase-3 
[23]. Although apoptosis in the Vero E6 cell infected by 
SARS-CoV has not been reported, the result in our 
experiment may give a clue to apoptosis in the virus- 
infected cell and the N protein of SARS-CoV may be a 
substrate of caspase-3 in vivo. 


3.2.3 Membrane glycoprotein analysis 


M protein (the membrane glycoprotein or matrix protein, 
E1 membrane glycoprotein), a transmembrane protein, is 
the most abundant glycoprotein in infected cells as well 
as in the virus particle of known coronaviruses [30]. It has 
three domains: a short N-terminal ectodomain, a triple- 
spanning transmembrane domain in the N-terminal half 
of the protein, and a C-terminal endodomain. TMHMM 
(http://www.cbs.dtu.dk/services/TMHMM and Tmpred 
http://www.ch.embnet.org/software/TMPRED.html anal- 
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ysis indicated that the three transmembrane helices were 
approximately located at residues 15-37, 50-72 and 
77-99, with the 121-amino acid hydrophilic domain on 
the inside of the virus particle [7]. In our experiment, how- 
ever, only the sequence from 107-221 amino acids was 
covered (Table 3), which may be because of its high hydro- 
phobicity, especially in its N-terminus. Furthermore, our 
experimental results showed that besides the 25 kDa posi- 
tion (slice 22 in Fig. 3), M protein also appeared at higher 
molecular weight position but only together with the spike 
protein in SDS-PAGE. Whether this is because of a possible 
interaction between M and S protein or the hydrophobic 
character of the M protein, which kept it at its higher 
molecule weight position, still needs further research. 


3.2.4 Envelop protein analysis 


The small envelope (E) protein has been recognized as 
a structural component of the coronaviruses such as 
MHV, IBV, TGEV [29, 31-32]. Recent research shows E 
protein has two major biological functions. MHV E protein 
can induce apoptosis in cells expressing E protein and 
overexpression of Bcl-2 oncoprotein suppresses MHV E 
protein-induced apoptosis indicating that initiation of the 
apoptotic pathway begins upstream of Bcl-2 [33]. Further- 
more, coexpression of the genes encoding the MHV-A59 
and E protein results in the production of virus-like parti- 
cles, and E protein membrane vesicles can be released 
only from E protein expressing cells as well as MHV 
infected cells [34, 35]. These results indicate that the E pro- 
tein plays a pivotal role in virus envelope formation. 


In our experiments, two strategies were used to identify 
the E protein. First, the total proteins from Vero E6 cells 
infected by SARS-CoV were separated by 15% SDS- 
PAGE, all bands were sliced, in-gel digested and MS/MS 
analyzed. Three structural proteins (S, N and M) were 
identified, but E protein was not found. Then we used 
2-D LC-MS/MS, the total proteins from the infected cells 
were digested directly and 2-D capillary HPLC separated 
the digestion with a strong cation ion exchange chroma- 
tography as the first dimension and reverse phase chro- 
matography as the second dimension, and the protein 
was still not identified. It is therefore likely, as previously 
reported for other coronaviruses, that the E protein is 
present only in minute amounts in infected cells [31] and 
in the viral envelope. Its strong hydrophobicity, i.e. the 
N-terminal two-thirds region is highly hydrophobic [36, 
37], also results in difficulties to discover this protein. 
However, we can not conclude so far that the E protein is 
not expressed in Vero E6 cells infected by SARS-CoV or 
are not present in the SARS-CoV envelope, because the 
E protein is not only present in envelopes, but also plays 
an essential role in the assembly of known coronaviruses. 
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4 Concluding remarks 


Proteomic analysis was utilized to uncover the natural 
structural protein constitutes of BJO1 strain of SARS- 
CoV. Three structural proteins, /.e., S, N and M proteins, 
were identified and characterized from the cultured Vero 
E6 cells infected with the virus and their antigenecity dis- 
played with patient sera, of which the M protein was iden- 
tified for the first time. Glycosylation modification of S 
protein was analyzed, and four glycosylation sites were 
characterized. Cleavage of S protein into two subunits 
was also suggested. Molecular weight determination of 
intact N protein showed that a post-translational modifi- 
cation only happened in its N-terminus as acetylation and 
no phosphorylation modification was detected within the 
entire N protein. Antigenicity analysis indicated that the N 
protein has a prominent immunogenicity to the con- 
valescent sera from patients with SARS. The immune 
response of S protein probably depends on the strong 
interaction with the M protein. Cleavage of recombinant 
N protein with caspase-3 and -6 in vitro demonstrated 
that the series of shorter isoforms of N protein observed 
in SDS-PAGE might be the products of caspase-3 cleav- 
age rather than caspase-6 and might have a relationship 
with the apoptosis of Vero E6 cells induced by the infec- 
tion of SARS-CoV. 


The experimental results indicate that proteomics strate- 
gy is a very powerful method to discover and identify the 
proteins from a complex system. The molecular biological 
information of natural proteins from SARS-CoV, especially 
the processing and modification of structural viral pro- 
teins could be a complement of the genomic information 
and provide direct molecular basis for further research on 
diagnosis, prevention and treatment of SARS. 
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